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ABSTRACT. A major challenge in the analysis of population genomics data consists of 
isolating signatures of natural selection from background noise caused by random drift 
and gene flow. Analyses of massive amounts of data from many related populations re- 
quire high-performance algorithms to determine the likelihood of different demographic 
scenarios that could have shaped the observed neutral single nucleotide polymorphism 
(SNP) allele frequency spectrum. In many areas of applied mathematics, Fourier Trans- 
forms and Spectral Methods are firmly established tools to analyze spectra of signals and 
model their dynamics as solutions of certain Partial Differential Equations (PDEs). When 
spectral methods are applicable, they have excellent error properties and are the fastest pos- 
sible in high dimension; see 1151 . In this paper we present an explicit numerical solution, 
using spectral methods, to the forward Kolmogorov equations for a Wright-Fisher process 
with migration of K populations, influx of mutations, and multiple population splitting 
events. 



1. Introduction 

Natural selection is the force that drives the fixation of advantageous phenotypic traits, 
and represses the increase in frequency of deleterious ones. The growing amount of 
genome-wide sequence and polymorphism data motivates the development of new tools 
for the study of natural selection. Distinguishing between genuine selection and the effect 
of demographic history, such as gene-flow and population bottlenecks, on genetic varia- 
tion presents a major technical challenge. A traditional population genetics approach to 
the problem focuses on computing neutral allele frequency spectra to infer signatures of 
natural selection as deviations from neutrality. Diffusion theory provides a set of classical 
techniques to predict such frequency spectra [8, 21, 4|, while the connection between dif- 
fusion and the theory of Partial Differential Equations (PDEs) allows for borrowing well 
established high-perfomance algorithms from applied mathematics. 

The theory of predicting the frequency spectrum under irreversible mutation was de- 
veloped by Fisher, Wright and Kimura ||6] |22] [T3) . In particular Kimura lfl4l noted that 
this theory was applicable to many nucleotide positions and introduced the infinite sites 
model. The joint frequency spectra of neutral alleles can be obtained from the coalescent 
model |20| or by Monte-Carlo simulations ifTTl . The analysis in terms of diffusion theory 
is mathematically simpler and can incorporate natural selection easily EH HI . ^ n tn ^ s 
paper, we model the demographic history of K different populations that are descended by 
K — 1 population splitting events from a common ancestral population. The populations 
evolve with gene exchange under an infinite sites mutation model. We introduce a powerful 
numerical scheme to solve the associated forward diffusion equations. After introducing 
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Figure 1 . A graphical representation of a model for the demographic 
history of three populations. 



a regularized discretization of the problem, we show how spectral methods are applied to 
compute theoretical Non-Equilibrium Frequency Spectra. 

The introduction of spectral methods is usually attributed to [fl~6j, although they are 
based on older precursors, such as finite element methods, and Ritz methods in quantum 
mechanics 0171 . The basic idea consists of using finite truncations of expansions by com- 
plete bases of functions to approximate the solutions of a PDE. This truncation allows the 
transformation of a diffusion PDE into a finite system of Ordinary Differential Equations 
(ODEs). The motivation to use these methods relies on their excellent error properties, and 
their high speed. In general, they are the preferred methods when the dimension of the 
domain is high |Q3), and the solutions to the PDE are smooth. This is because the number 
of basis functions that one needs to approximate the solutions of a PDE is much lower than 
the number of grid-points that one needs in a finite difference scheme, working at the same 
level of accuracy Q . 

As we show in this paper, the convergence of spectral methods depends on the smooth- 
ness of the solutions to be approximated. In many situations, solutions to diffusion equa- 
tions have good analytical properties, and spectral methods can be applied. However, the 
application of these methods to the problem that interests us here requires a proper dis- 
cretization of the problem. Influx of mutations, population splitting events and boundary 
conditions have to be properly regularized before one applies these methods and exploits 
their high-perfomance properties. 

1.1. Non-Equilibrium Frequency Spectra. The A'-dimensional Allele Frequency Spec- 
trum (AFS) summarizes the joint allele frequencies in K populations. We distinguish 
between the AFS, which consists of the unknown distribution of allele frequencies in K 
natural populations, and observations of the AFS. Given DNA sequence data from multi- 
ple individuals in K populations, the resulting observation of the AFS is a if-dimensional 
matrix with the allele counts (for a complete discussion on this see 1201 ). Each entry of 
the matrix consists of the number of diallelic polymorphisms in which the derived allele 
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was found. In other words, each entry of the AFS matrix is the observed number of de- 
rived alleles, j a , found in the corresponding number of samples, n a , from population a 

(1 < a < K). 

The full AFS is the real distribution of joint allele frequencies at the time when the 
samples were collected. If the total number of diploid individuals in the a" 1 -population 
is N a n a , the natural allele frequencies x\ = ii/(2N\), x-x = iz/i^Nz), ■ ■■Xk = 
ix /{2Nk ) (with i a the total number of derived polymorphisms in the a th population) can 
be seen as points in the A'-dimensional cube [0, l] K . Thus, given the frequencies of every 
diallelic polymorphism (which we indexed by r) x\, x\, . . . , x r K , the AFS can be expressed 
as the probability density function 

1 S 

(1.1) <j>{x) = — s ( x i - x\)8{x 2 - x r 2 ) ■ ■ ■ 5{x K - x r K ). 

b r=l 

Here, S is the total number of diallelic polymorphisms segregating in the K populations, 
and S( ) is the Dirac delta function. 

Our goal is to determine this AFS under the infinite sites model. Any demographic 
scenario in the model is defined through a population tree topology T, such as in Fig. Q] 
and a set of parameters that specify the effective population sizes N e>a , splitting times tA, 
and migration rates m a b at different times. Hence, 2N e ,afnab is defined as the number of 
haploid genomes that the population a receives from b per generation. For simplicity, we 
refer to the set of parameters that specify a unique demographic scenario as 0. 

Thus, given a population tree topology and a choice of parameters, we will compute 
theoretical densities of derived joint allele frequencies as functions on [0, l] K of the type 

A-l A-1 A-1 

(1.2) <t>(x\Q,T) = J2 E'" E a iui2 ,..., iK {Q,T)R il {x 1 )R i M---R iK {x K ), 

ii= 0«2— iK—0 

with A a truncation parameter, {Ri(x)}°^L a complete basis of functions on the Hilbert 
space L 2 [0, 1] to be defined below, and ai x ...i K the coefficients associated with the projec- 
tion of <p(x\Q, T) onto the basis spanned by {Ri 1 {x)Ri 2 (x) ■ ■ ■ Ri K (x)}. These continu- 
ous densities relate to the expectation of an observation of the AFS via standard binomial 
sampling formulae 
(1.3) 

f K n ! 
p{ji,---3K\ni,...n K ) = / <f>{x\®,T) TT a ' . x ] a a {l - x a ) na ~ Ja dx a . 

J[0,1] K ^=1 ~ 

Using Eq. ( 11.3b we can compare model and data, for instance, by means of maximum 
likelihood. 

The major goals of this paper are twofold. First, we present the finite Markov chain 
and diffusion approximation, associated with the infinite sites model used to compute neu- 
tral allele spectra. A special emphasis is made on the boundary conditions and the influx 
of mutations, because of their potential singular behavior. Second, we introduce spectral 
methods and show how to transform the diffusion equations into coupled systems of Or- 
dinary Differential Equations (ODEs) that can be integrated numerically. In particular, we 
introduce a set of techniques to handle population splitting events, mutations and boundary 
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interactions, that protect the numerical setup against Gibbs phenomena^ A detailed analy- 
sis of the stability of the methods as a function of the model parameters, and the control of 
the numerical error, are included at the end of the paper. 



2. Finite Markov chain model 

The evolutionary dynamics of diallelic SNP frequencies in a randomly-mating diploid 
population can be modeled using a finite Markov chain, with discrete time t represent- 
ing non-overlapping generations. For simplicity, we consider first one population with N 
diploid individuals, and later will extend the results to more than one population. 

The state of the Markov chain at time t is described by the vector fj(t), with 1 < j < 
2N. Each entry, fj(t), consists of the expected number of loci at which the derived state is 
found on j chromosomes. Therefore, fj W ' s me expected number of polymorphic 

loci segregating in the population at time t, and f2N (t) is the expected number of loci fixed 
for the derived state. The model assumes that the total number of sites per individual is so 
large, and the mutation rate per site so low, that whenever a mutation appears, it always 
does so on a previously homoallelic site 04) • 

The vector fj(t), is also called the density of states. Under the assumption of free 
recombination between loci and constant mutation rate, the time evolution of fj (t) under 
random drift and mutation influx is described by the difference equations 

2N 

(2.1) /j(* + l)=£W)/i(*)+/*j(*), 1 <J< 2N - 

i=i 

In its simplest form, one assumes that the alleles in generation t + 1 are derived by sampling 
with replacement from the alleles in generation t. Therefore, the transition coefficients in 
the chain Eq. ( 12. U are 

(2.2) P(j\i) = (i/(2N)y{l - (z/(2iV))} 2Jv ^. 

This describes stochastic changes in the state after a discrete generation, Fig. [2] The second 
term in Eq. ( 12. Il l represents the influx of polymorphisms. Mutations are responsible for 
the creation of new polymorphisms in the population. The influx of mutations depends on 
the expected number of sites 2Nv, in which new mutations appear in the population each 
generation^. If we assume that at each generation, every new mutation is found in just one 
chromosome, then 

(2.3) n j {t)=2Nv5 hj , 

for the mutation alone J4). The term S^j in Eq. ( 12.3b is the Kronecker symbol, with 
5i.j = 1 if j = 1 and 8\ j = otherwise. 



Gibbs phenomena are numerical instabilities that arise when the error between a function and its truncated 
polynomial approximation is large. 

^The expected number of sites 27V i/, relates to the expected number of mutations per base 2Nu, by the total 
length L of the genomic sequence under study in units of base pairs, u = u X L. Sometimes in this paper, in an 
abuse of notation we do not distinguish between v and u, and they are seen as the same quantity expressed with 
different units. 
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2. 1 . Effective Mutation Densities. In applications of the infinite sites model, one usually 
finds that the census population size and the effective population size that drives random 
drift in Eq. (12.21 i are not the same 1141 . For this reason, we distinguish between N e , 
the effective population size that defines the variance of the Wright-Fisher process in Eq. 
( 12.21 i. and the census population size N that can be used to define the allele frequencies 
x = i/(2N). Therefore, the smallest frequency, x = 1/(2N), with which new mutations 
enter populations will be sensitive to small stochastic fluctuations in the census population 
size, even if the effective population size remains constant. This is important when we take 
the diffusion limit of Eq. (12. U . and the stochastic process is described by the continuous 
variable x = j / (27V), rather than the integer j. If we consider a constant census population 
size, the term Eq. (12.31 l in the Markov chain is substituted by 

(2.4) 5 ltj ^5(x-l/(2N)), 

in the diffusion limit. However, if the census population size per generation is a stochastic 
variable distributed as F(N)dN, the diffusion limit of the mutation term will be 

/>oo 

(2.5) 5i,j i — ^ n(x) = S(x- l/(2N))F(N)dN. 

Jo 

We expect that fi(x) will have some general properties, independent of the particular char- 
acteristics of F(N)dN. For instance, in many realistic scenarios /x(x) will be a function 
that is nearly zero for frequencies x > x*, with x* = l/(2iV m j„) a very small characteris- 
tic frequency associated with the inverse of the minimum census population size. 

Other phenomena that might not be properly captured by the simple mutational model 
in Eqs. ( 12.31 and (12.41 , consist of organisms with partially overlapping generations, and 
organisms in which mutations in gametes arise from somatic mutations. When an organ- 
ism has a mating pattern that violates the assumption of non-overlapping generations (e.g. 
humans), the generation time in the model Eq. (12.2b is interpreted as an average generation 
time. Hence, during a generation unit, there is time enough for some individuals to be born 
with new mutations at the beginning of the generation time, and to reproduce themselves 
by the end of a generation unit. This implies that after one average generation, there can 
exist new identical mutations in more than one chromosome. Similarly, when the gametes 
of an organism originate from somatic tissue, they inherit de novo mutations that arose in 
the soma after multiple cell divisions. If the individuals of this organism can have more 
than one offspring per generation, one expects to find the same new mutation, in the same 
site, in more than one chromosome per generation. 

Because of these different biological phenomena, we believe that the notion of effective 
mutation density, fj,(x), is a more general way to describe mutations in natural popula- 
tions. The effective mutation density describes the average frequency distribution of new 
mutations per generation, in one population, after taking into account the effects due to 
stochastic changes in census population size, non-overlapping generations and/or muta- 
tions of somatic origin. From a numerical point of view, effective mutation densities are 
a useful tool to avoid the numerical instabilities associated with polynomial expansions of 
non-smooth functions (e.g. Dirac deltas) that appear in the standard approaches to muta- 
tion influx. As we show later when we discuss the continuous limit of the infinite sites 
model, different effective mutation densities can yield predictions which are identical to 
predictions of models based on Eq. (12.4l i. 

2.2. More than one population. Here, we show how to incorporate arbitrarily more pop- 
ulations, and migration flow between them. Generally, for the state in the chain we consider 
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a discrete random variable X which takes values in the A'-dimensional lattice of derived 
allele frequencies: 

/ ii/(2JVi) \ 
« 2 /(2iV 2 ) 



(2.6) 



X 



V i K /{2N K ) J 

with K the number of populations, and < i a < 2N a . For simplicity, we use a single 
index notation, < I < Y[ a 2-7V a , to label the states where the random variable X takes 
values. The random variable X = / jumps to the state X = J at a discrete generation unit, 
with prescribed probability P(J\I). The density of states in this multi-population setup is 
flit), and the difference equations that describe its dynamics are equivalent to Eq. d2.ll ). 
The transition matrix P = P(J\I) incorporates random drift and migration events between 




Time 



Figure 2. One unit of time transition in a finite Markov chain. 



populations. New mutations enter each population with an effective mutation vector jl 

( o \ 



(2.7) 



M = 



V o / 

in the standard model, the mutation density is /z| = 2N a v8\ i j. 

The Markov chain for a Wright-Fisher process for two independent populations is de- 
fined by the transition matrix Pj 1 j 2 -,i 1 i 2 = 

(2.8) Bi{ji;2N e , 1 ,i 1 /(2N e!l ))Bi(j 2 ;2N e , 2 , i 2 /(2iV e , 2 )), 

where Bi(j; k,p) stands for the binomial distribution with index k and parameter p. Also, 
we can introduce migration between populations, by sampling a constant number of alleles 
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n a b in population a that become part of the allele pool in population b. Thus, in a model of 
two populations with migration, one considers the transition matrix 

P jlh ;i li2 = Bi(Ji -h;2N eA -n 21 ,t 1 /(2N eA ))Bt(k 1 ;n 21 ,i 2 /(2N ea )) x 

k u k 2 =0 

(2.9) xB^0 2 -A :2 ;2iV e , 2 -n 12 ,^ 2 /(2iV e:2 ))B^(fc 2 ;n 12 ,^ 1 /(27V e , 1 )). 

In this model the parameter space is given by the effective population sizes N e ,i and N e ^ 2 , 
and the scaled migration rates n 2 \ and ri\ 2 . This process is generalizable to an arbitrary 
number of populations in a straightforward way. 



3. Diffusion approximation 

Diffusion approximations to finite Markov chains have a distinguished history in pop- 
ulation genetics, dating back to Wright and Fisher. This approach can be used to describe 
the time evolution of derived allele frequencies in several populations, with relatively large 
population sizes. This approximation applies when the population sizes N a are large (if 
N e > 50, the binomial distribution with index 2 N e can be accurately approximated by the 
Gaussian distribution used in the diffusion limit) and migration rates are of order 1 /N e . 

In the large population size limit, the state space spanned by vectors such as Eq. j2.6i 
converges to the continuous space [0, 1) K . The density of states fi(t) on the state space 
will converge to a continuous density 4>(x,t) on [0, l] K . The time evolution of 4>(x,t) 
depends on how the inifinitesimal change SX, 

X(t + 5t) = X(t) + 5X, 

is distributed. If the mean of the SX distribution is M(X, t) and the covariance matrix is 
C(X, t), the time continuous limit St — > + of the process X{t) is well defined. In the 
small, but finite, limit of St the stochastic process obeys the equation 

(3.1) X(t + St) = X(t) + M(X, t)8t + a(X, t)eVSt, 

where e*is a standard normal random variable (with zero mean and unit covariance matrix) 
in H K , cr(X, t) is a square root of the covariance matrix C(X, t) = <j<t t (X, t), and St is 
a finite, but very small, time step. 

In the diffusion approximation to the discrete Markov chain, the process is described as 
a time continuous stochastic process governed by Gaussian jumps of prescribed variance 
and mean. These processes can be denoted using the notation of stochastic differential 
equations: 

K 

(3.2) dX a t = M a (X t ,t)dt + <J ab {Xt,t)dW?, 

b=l 

where dW b is the infinitesimal element of noise given by standard Brownian motion in 
i£T-dimensions, and a is the square root matrix of the covariance matrix C = aa T , lfl9l . 
The diffusion generator associated with Eq. ( 13.21 i is 

K 8 1 K B 2 

(3.3) c = Y M a {x,t)^- + - Y C ab {x,t) ^ a 

^ v ' dx a 2^ v ' ' dx a dx b 

a=l b=l 
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Thus, if <p(x, t = 0) is the density of allele frequencies at time 0, the time evolution of 

4>(x, t) will be governed by the forward Kolmogorov equation 

(3.4) 

^§r= E 5ft^[^(*.*M*.o]-f;^[^(*.*)^«)]+p(*,«). 

a,b=l a=l 

Here, p{x, t) is the continuous limit of fij in Eq. ( 12. lb , that describes the net influx of 
polymorphisms in the population per generation. 

3.0.1. Modeling migration flow and random drift. The continuous limit of the Markov 
chain defined in Eq. (12.91 , in the case of A diploid populations and in the weak migration 
limit, has as associated moments 



(3.5) M a (x,t) =J2 m ab(x b 



!, 

(3.6) C ab (x,t)=S ab Xa ^~ Xa \ 

with S ab the Kronecker delta (5 ab = 1 if a = b and 5 ab = otherwise). The matrix 
element m a b = n a t,/(2N a ) defines the migration rate from the b th population to the a th 
population. 

Thus, associated with this process one has the Kolmogorov forward equations 

® XI -A V* 1 ^ f xab X a{l ~ x a) 

<l>{x,t) = ^-——l8 — — <j>{x,t) 



dt Y\ , i 2dx a dx b V 2 A, 

a,b 

d 

(3.7) — - — {m ab (x b - x a )4>(x,t)) + p(x,t). 

ox a 

Eq. ( 13.7b describes the time evolution of the frequency spectrum density under random 
drift and migration events between populations, given an initial density and absorbing 
boundary conditions (see below). The inhomogenous term p(x, t) models the total in- 
coming/outgoing flow of SNPs per generation into the A'-cube which is not due to the 
diffusion flow, j a = —M a <f) + d b {C ab (f), at the boundary. This total flow depends on 
mutation events that generate de novo SNPs: inflow from higher dimensional components 
of the allele density (see below), inflow from migration events from lower dimensional 
components of the allele density, and the outflow of migration events towards higher di- 
mensional components. If there is not a positive influx of SNPs, the density would converge 
to 4>(x, t) —> as t — s- oo. In order to understand the probability flow between different 
components of the density of alleles, we will have to study how the boundary conditions 
are defined precisely. 

3.1. Boundary Conditions. Understanding the boundary conditions in this problem is 
one of the most challenging tasks. In Kimura's famous solution to the problem of pure 
random drift in one population, Ifl2l . he required the solutions to the diffusion equation to 
be finite at the boundaries x = and x = 1. This boundary condition is absorbing. The 
points x = and x ~ 1 describe states where SNPs reach the fixation of their ancestral or 
derived states. 

If we consider K populations, the natural generalization of Kimura's boundary condi- 
tions can be derived by studying the possible stochastic histories of single diallelic SNPs 
segregating in the K populations. A SNP which is initially polymorphic in all the K pop- 
ulations can reach the fixation of its derived or ancestral state in one population while still 
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being polymorphic in the remaining K — 1 populations. More generally, a SNP can be 
polymorphic in K — a populations, while its state can be fixated in the remaining a pop- 
ulations. A convenient way of visualizing this is to look at the geometry of the A-cube 
of allele frequencies, and the different subdimensional components of its boundary (see 
examples Fig. [3] and Fig. [4] for the 2-cube and 3-cube). A A-cube's boundary can be 
decomposed as a set of cubes of lower dimensionality, from (K — 1) -cubes up to 0-cubes 
or points. The number of boundary components of codimension a, i.e. the number of 
(K — a)-cubes, contained in the boundary of the A'-cube is 

2 a A! 

(3.8) # (A - a) cubes = — —r. 




Figure 3 . Decomposition of the singular probability density, for two 
populations, on the two-dimensional bulk and the different subdimen- 
sional boundary components. 

The most important set of boundary components are the (K — l)-cubes, because any 
other boundary component can be expressed as the intersection of a finite number of (A — 
l)-cubes at the boundary. We identify each 2K codimension-one boundary component by 
the population where the SNPs are not polymorphic, and by the state that is fixated in this 
population (Derived or Ancestral). For example, the component (i, A) is defined as the 
set of points in the A'-cube that obeys the equation Xi = 0, and the component (i, D) is 
defined by the equation Xi = 1. Therefore, any codimension a boundary component can 
be expressed as the intersection 
(3.9) 

(ii Si)n(i 2 S 2 )C\- ■ ■ (i a S a ) = {x G [0, l^lxsj = 6 Si ,d; x i2 = 5s 2 ,d; •■•;x ia = S Sc ,,d}, 

with i a 7^ ip when a ^(5, 6s. d = 1 for the derived state S = D, and 5s d = for the 
ancestral state S = A. 

To each boundary component of codimension a we associate a (K — a)-dimensional 
density of derived allele frequencies that are polymorphic only on the corresponding K — a 
populations, while are fixated in the other a populations. In this way, <f>^ denotes the 
bulk probability density, {4>^' Si ^}iZ^ (with the state Si being either ancestral Si = A or 
derived Si = D) are the 2K codimension-one densities, {^l*' 3 *' 3 ''^}^ the codimension 
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Figure 4. Decomposition of the singular probability density, for three 
populations, on the three dimensional bulk and the different subdimen- 
sional boundary components. 



2 densities, etc. This decomposition is illustrated in the case of 2 and 3 populations in Fig. 
Sand Fig. 

In this notation, we write the density of derived alleles segregating on K populations as 
the generalized probability function 



(x,t) = <t> {0) {x,t) (<t> (i ' A) (x,t)6( Xi ) + ^ D \x,t)5( Xl - 1)) + 

i=l 

K 

J2 (cj ) ^ A ^ A \x,t)S(x l )S(x J ) + ^ A ^ D \x,t)S(x l )6(x J - 1) 



2=1 

K 

(3.10) +4 i ' D ' j ' D ^(x,t)S(x i - l)S( Xj - 1)) + 

with c)(-) the Dirac delta-function. The points (1, A) n (2, A) n ■ ■ ■ (K, A) and (1, D) n 
(2, D) fl • • ■ (K, D) are the universal fixation boundaries, and they do not contribute to 
the total density of alleles in Eq. ( 13.101 ). It is useful to write the probability density cj>(x, t) 
using such decomposition, because despite being a singular generalized function, each 
boundary component <^( 1 > s 'i;j>' s '2'--0 ( X; w j]] b e> most of the time, a regular analytic func- 
tion. 

The dynamics of the boundary components ^i^i^A,...) ^ ^ are governed by diffu- 
sion equations, with an inhomogenous term, of the type 

(3.11) -A ( ? Tl afc (.T b - a;Q )^ 1 ' Sl ^' S --)(x,<)) 
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with p^ 1 ' 31 ' 12 ' 32 '-'-' (x,t) the net incoming/outgoing flow into the boundary component 
(ii,Si) n (12, 5*2) fl . . .. The p term can be interpreted as an interaction term between 
different boundary components. 

More precisely, pC*!,^;^,^;—) (x, t) consists of four terms 

(3.12) p(.^,S 1 ;i a ,S a ;...;i a ,S a )^ X} q = pmut ( Xj t ) + pdrift ( x , t) + p lUm (x, t) + p outm (x, t). 

Pmut(x,t) is the influx of spontaneous mutations (only present in codimension K — 1), 
Pdrift{x, t) consists of the boundary inflow from codimension a — 1 components that have 
(ii, Si) n (ia, S2) n . . . n (i a , S a ) as a boundary component, pi„ m (a;, i) represents the 
incoming flow due to migration events from lower dimensional boundary components, and 
Pout m (x, t) is the outflow due to migration events from Si) n (%2, S2) fl . . . (i a , S a ) 
towards higher dimensional components that have (ii, Si) D S2) fl . . . (i a , 5 a ) as a 
boundary component. 

We can write in a more precise way each term in pi^' 3 ^ 12 ' 32 ^--) {x, ^ as f u ows: 

• Pmut- 

a 

(3.13) P %i? l!i *' Sai "- iia ' aa \x,t) =Y, 2N e,aU6 a .K_ 1 p( Xa ) - fy.a), 

a j=l 

with (5 q .a _ -i = 1 if en = K — 1, 5 a ,K-i = if a 7^ 7\ — 1, and p(x a ) is the 
mutation density (in the classical theory, p,(x a ) = S(x a — l/(27V eja ))). 

• Pdri /«: Assuming that the first derivatives of <f)(x, t) are finite at the boundary, 

(3.i4) = e eMjt-e^) 

' b 

where the sum over j a is over all components of codimension a — 1 that have 
n (t2, S2) fl... (i a ,S a ) as a boundary component. £g. 4 is 1 when 
Sj Q = A and when Sj Q = D; similarly S3. ,d is one when Sj a = D and zero 
when Sj a = A. The sum over a and b is over all populations that are not j\ , J2, ■ . ■ 

3a- 

• pin m ■ Here, and throughout, c a is a shorthand for the boundary component (j'i, Si; 
i2,S2; . . . ; ia,S a ). p[^" (x, t) represents the total incoming flow due to migra- 
tion events of SNPs that are contained in densities of SNPs located at boundary 
components of c a . If Bd(c a ) is the set of boundary components of c a with fixed 
codimension d (a < d < K), 

Bd = {(ii, Si;i2, S2; ■ ■ ■ ; i a , S a ;j a+ i, S a+ i; . . .jd, Sd)}j a+1 ,...,j d , 

then /4„ (x, t) can be written as the sum of contributions from all boundary com- 
ponents in Bd, for all codimensions d=a+l,a + 2, ...,K, and for all possible 
migration events from elements q in Bd(c a ) to c a : 

(3.15) 

p?::\x,t)= E £0 (9) (*.*)*( E K«o n 5 &> /;.>)• 

Here, r(q — » c Q ) is the set of all possible migrations events from SNPs in (j)^ 
to (f)( Ca \ p(e) denotes the probability of occurence of the migration event e, and 
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< fl £ c a denotes the expected frequency, in the i -population, of a SNP 
that enters c a after the event e. We provide below a more precise description of 
Eq. (13.15b . such as a description of the event space T(q —> c a ), the corresponding 
probabilities of occurence and expected frequencies. 
• Pout m ■ Denotes the outflow of SNPs due to migration events to higher dimensional 

m (c ) 

boundary components. In other words, p ^ tm measures the rate of loss of SNPs in 
4>( Ca \ because of migration flow towards boundary components of codimension 
d < a, that have c a as a boundary component. Let Idq.c a be a discrete function 
that returns 1 when c a is a boundary component of q, and zero when it is not. 
Thus, 



To compute Eq. ( 13.151 l and Eq. ( 13.161 l, the use of approximations is unavoidable. In 
principle, one could use the transition probabilities of the finite Markov chain to estimate 
the probabilities of different migration events and their expected allele frequencies. How- 
ever, there is a simpler approximation, which is consistent with the weak migration limit 
in which the diffusion equation is derived. 

This approximation follows from the observation that at the boundary x a = 5s,d, the 
strength of random drift along the population a vanishes (.T a (l — x a ) = 0), and hence, the 
infinitesimal change in x a obeys a deterministic equation: 



Eq. (13.171 implies that a migration event from several populations b, to a target population 
a, can push the frequency x a of a SNP out of the boundary where it was initially fixated 

(x a = 5s, d)- 

Therefore, given a X-cube, a boundary component c a (of codimension a), and a bound- 
ary component q (of codimension /3 > a) of c Q , we say that there will exist migration flow 
from q to c a , if and only if 

(3.18) = ^25 Sat ,Am atb x b - 5 Sat ,Dm atb (l - x b ) ^ 0, 

6=1 

(3.19) = E 5s n ,Am nb x b - S Sn ,Dm nb (l - x b ) = 0, 

6=1 

where {x n }" =1 denote the allele frequencies of SNPs which are fixated at the boundary 
components c a and q, {at}a t =a+i are tn e populations at c a whose allele frequencies are 
polymorphic, and the frequencies x b are defined at the boundary component q (which 
means that x b is polymorphic as long as b > /3, and is 5$ b ,d otherwise). It is important to 
realize that x b can be or 1 at q, and a migration event to c a can still bring alleles of the 
opposite state that is fixated in the target population. 

In this approximation, T(q — » c a ) consists of a single element, and p(e) can be zero 
or one. If Eq. ( 13.181 l and Eq. ( 13.191 ) are satisfied, the migration event in T(q —> c a ) has 
probability p(e) = 1, and the expected frequencies are 

K 

(3.20) f at = Y Ss "f A7ria tb x b - S Sat ,Dm atb (l - x b ). 




(3.17) 




6 
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If Eq. ( 13.181 ) and Eq. ( 13.191 ) are not satisfied, p(e) = 0, and we say that there is not 
migration flow from q to c a . 

3.2. Effective Mutation Densities. Given a constant spontaneous mutation rate in the 
species under study, of u "base substitutions per site and per generation," and expected 
number of sites v = L x u where new mutations appear in the population each generation, 
the total number of de novo mutant sites that appear in the population a, every generation, 
is 2N eXL v. We can model this constant influx of mutations by adding a Dirac delta term 

(3.21) 2N e , a v8{x a - l/(2N e>a )), 

to the K diffusion equations that govern the ancestral components of codimension K — 1 
0Oi A)r\(i 2 A)r\---(% K -i A) ^ ^ However, as we discussed above, more generally we work 
with an effective mutation density 

(3.22) 2N e , a vn{x a ). 

As a particular example of an effective mutation density, we consider a stochastic census 
population size, which is a random variable distributed as 

(3.23) F(N)dN = ^ exp(- K /(27V))d7V. 

This distribution avoids extremely small populations by an exponential tail, while large 
population sizes are distributed as ~ iV~ 2 , as shown in Fig. [5] In this model, we keep 
constant the effective population size N e that defines the variance of random drift in Eq. 
( 13.61 ). Thus, the mutation density will be 

f°° c 

(3.24) M<» = / 6{x-l/2M)— 1 cxp(-K/2M)dM. 

We can integrate Eq. (13.241 ) exactly, by making the change of variables y = 1/2M, dM = 
-dy/2y 2 : 

(3.25) ji(x) = cexp(— kx). 



3.3. Population splitting events. So far we have studied how the allele frequency density 
changes as a function of time while the number of populations K remains constant. When 
two populations split, the diffusion jumps to dimension K + 1, and the corresponding den- 
sity will obey the time evolution defined by Eq. (13.7b for K + 1 populations, with different 
population sizes and migration parameters. The initial density 4>k+i{x, xk+i, t) in the 
K + 1 diffusion problem is determined from the density 4>k{x, t), before the populations 
split. Therefore, if population K + 1 was formed by iV/ i0 migrant founders from the a th 
population, then 



(3.26) (f> K +i{x 7 x K +i,t) = 4> K {x,t) 

This formula is derived by considering the binomial sampling of 2Nf^ a chromosomes from 
population a, and using the Gaussian approximation for the binomial distribution with 
2Nf :<1 degrees of freedom and parameter x a . In the limit Nf , a — > oo, Eq. ( 13.261 ) simplifies 
to 



iV 



f,a 



-N fia .(x a -x K + 1 ) 2 /(x a (l-x a )) 



7TX a (l - X a ) 



(3.27) 



(f>K+l{x, XK+l-t) = 4>K{x,t)5(x a ~ XK+l), 
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1000 2000 3000 '1000 5000 6000 

Census population size (A'} 



Figure 5. A model for a stochastic census population size, with ex- 
ponential decay in the small population size limit, a quadratic decay 
~ N~ 2 in the large population size limit, and a population peak at N = 
1000. 



with 5(x) the Dirac delta. Additionally, if the new population is formed by migrants from 
two populations merging, with a proportion / from population i and 1 — / from population 
b, then 

(3.28) <j> K+ i(x,x K +i,t) = 4>K{x,t)5{fx a + (1 - f)x b - x K+ i). 

In the diffusion framework, one can also deal with populations that go extinct or become 
completely isolated. More precisely, if we remove the a th population, the initial density in 
the K — 1 dimensional problem will be 

(3.29) 4> K -i{x,t) = I 4> K (x,t)dx a , 

J [0,1] 

with x denoting the vector x = (xi, x%, . . . , x a -i,x a +i, . . . xk)- 

4. Solution to the diffusion equations using spectral methods 

The idea behind spectral methods consists of borrowing analytical methods from the 
theory of Hilbert spaces to transform a partial differential equation, such as Eq. ( 13.71 ), into 
an ordinary differential equation that can be integrated numerically using, for instance, a 
Runge-Kutta method. 

In general, the problems in which we are interested are mixed initial-boundary value 
problems of the form 

( 4.1) = L FP {x,t)^{x,t) + p{x,t), x G D = [0, if, 

(4.2) B(x)<f>(x, t) = 0, xedD,t> 0, 

(4.3) 4>(x,0) =g(x), xeD, 

where D = [0, 1} K is the frequency spectrum domain with boundary dD, Lfp(x, t) is a 
linear differential operator also known as the Fokker-Planck operator, p(x, t) is a function, 
and B(x) is the linear boundary operator that defines the boundary condition. In this paper, 
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we are interested in the particular set of PDEs defined in Eq. ( 13. 1\ . although we sometimes 
keep the notation of Eq. (14. jj as a shorthand. 

We assume that <fi(x, t) is for all t an element of a Hilbert space H of square integrable 
functions, and associated L 2 -product ( , ) L 2. Furthermore, we assume that all functions in 
H satisfy the boundary conditions imposed by Eq. ( 14.21 ). In spectral methods we consider a 
complete orthogonal basis of functions for %, that we denote by {ipi(x)}^Z , which obeys 

(4.4) (fa, tpj) L 2 = hiSij, 

with hi a function of i that depends on the particular choice of basis functions. One then 
approximates <p(x, t) as the truncated expansion 

A-1 

(4.5) VK<t>{x,t) = y £ d a i {t)i) i {x). 



Similarly, one approximates the PDE in Eq. (14.11 ) by projecting it onto the finite dimen- 
sional basis {V'i( a; )}i^o 1 ' as 

(4.6) §1 Va ^ x > *) = v ^ l fp(x, t)T A 0(x, t) + T AP (x, t). 

By H\ we denote the finite dimensional space spanned by {ipi(x)}f~Q, and by Va the 
corresponding projector % — > Ha- If 

A-1 

PAL F p(x,t)PA<i>(x,t) = V] u>ij(t)ipi{x)aj{t) 



i,j=0 



A-1 



V A p(x 7 t) = ^ Mi( x )i 
i=0 

we can re-write the ODE in Eq. (14.6b using just modal variables as 
(4.7) _^ = g w „ (t)a .( t)+A . 



One can solve Eq. ( 14.71 ) by discretizing the time variable t, and using a standard numer- 
ical method to integrate ODEs. Therefore, the spectral solution to the diffusion PDE is 
expressed in the form of a truncated expansion, like Eq. ( |4.5t , and has coefficients deter- 
mined by the integral of Eq. d4.71 i. 

There are many different ways to construct sequences of approximating spaces Ha that 
converge to T~L in the limit A^oo, when the domain is the K -cube. Here, we follow other 
authors' preferred choice |7|, and choose the basis of Chebyshev polynomials of the first 
kind. In the following section we introduce Chebyshev expansions and show why they are 
a preferred choice. 

4.1. Approximation of functions by Chebyshev expansions. Let {Ti(x)}°Z be the ba- 
sis of Chebyshev polynomials of the first kind. They are the set of eigenfunctions that solve 
the singular Sturm-Liouville problem 
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with i = 0, 1, . . . , oo, and — 1 < x < 1. {T^x)}^^ are orthogonal under the L 2 -product 
with weight function w(x) = 1/ \/l — x 2 : 

dx TTC 

'[-14] ' JX ' Vl - 



(4-9) / T^Tjix)^^ = ^5 lv 



where Co = 2 and c;>o = 1- This basis of polynomials is a natural basis for the approxima- 
tion of functions on a finite interval because the associated Gauss-Chebyshev quadrature 
formulae have an exact closed form, the evaluation of the polynomials is very efficient, and 
the convergence properties of the Chebyshev expansions are excellent Q- 

The Chebyshev polynomials of the first kind can be evaluated by using trigonometric 
functions, because of the identity Ti(x) = cos(i arccos(x)). The derivatives of the basis 
functions can be computed by utilizing the recursion 

(4.10) ^) = -^^) + ^^ 
to express the derivative as 

i-i 

(4.11) T>(x) = ~M X )- 

j=0 odd 3 

Similar formulae can be found for higher derivatives. The coefficients in the expansion 

A-l 

(4.12) V A f(x) = Y,aiTi(x), 

can be calculated by using the orthogonality relations of the basis functions 

2 I' 1 dr 

(4.13) di = — f(x)Ti(x 



-7T 



TTCi J_ 1 Vl - X 2 

However, a direct evaluation of the continuous inner product, Eq. ( 14.13b . can be a source 
of considerable problems, as in the case of the Fourier series. The classical solution lies in 
the introduction of a Gauss quadrature of the form 
(4.14) 

— / f{x) T i{x)-7==f - ~FiYl f{x k )T t (x k ), x k = cos [ ^rFT- 1 

TTCi J_x VI - X 2 CiQ J V 2( 5 

If f(x) is smooth enough, the finite sum over Q grid points in Eq. ( 14.14t will converge 
quicker than 0(Q~ 1 ) to the exact formula 03]. As Eq. (I4.14l i is equivalent to a discrete 
Fourier cosine transform, general results on the convergence of cosine transforms apply 
also to this problem. One can see this relationship by considering the change of variables 

x = cos y: 

or 1 dx 2 I"* 

(4.15) — / f(x)T i (x)-== = — f(cosy)cos(iy)dy, 

and choosing Q equally spaced abscissas in the interval < y < tt, 
(4.16) 



2 7T 

/(cos y) cos(iy)dy ~ — ^ / 

irci y fc=l 



2k- 1 

TT 



2Q 



2k -1 N 
l ~2Q~ 7T J 
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In order to study the convergence properties of the Chebyshev expansions Eq. ( 14.121 ). 
we exploit the rich analytical structure of the Chebyshev polynomials. By using the identity 
Eq. ( 14.81 ). one can re-write Eq. (14.131 ) as 



(4.17) 



7>s 



;dTi(x) 



dx 



dx. 



If f(x) is C7 1 ([— 1, 1]) (i.e., if its first derivative is continuous), we can twice integrate by 
parts Eq. (14.17b to obtain 

fi 

(4.18) 



f 

J-i dx 



y/T 



;df(x) 



d.r 



Ti{x) 



:dx. 



We can repeat this process as many times as f(x) is differentiable; thus, if f(x) <G C 2q 1, 
then 



(4.19) 



If we use the truncation error 
(4.20) 



\\f(x)-V A f(x)\\ L * = 



A-l 



dx 



Ti(x) 



dx 



dx 



1/2 



1/2 



as a measure of convergence of the Chebyshev expansion, we may estimate its asymptotic 
expansion by calculating the rate of decrease of a,. But as we showed in Eq. ( 14.191 ). 
\ a i\ = ^r> f° r some constant c(q) if f(x) 6 C 2q ~ 1 ({— 1, 1]). Therefore, for large A the 
error decreases as a power law 

1/2 



(4.21) 



\f(x)-V A f(x)\\ L 2 




< 



k 2 i- 



and if the function is infinitely differentiable (q = oo), the corresponding Chebyshev series 
expansion will converge faster than any power of 1/ A. 

In the applications of this paper we will work with re-scaled Chebyshev polynomials. 
As the Allele Frequency Spectrum is defined on the interval [0, 1], or direct products of 
it, we re-scale the Chebyshev polynomials to obtain an orthonormal basis on [0, 1]. More 
precisely, the basis that we use is {Ri(x) — ajT,((l — x )/^)}^Lq with x € [0, 1], a — 
\l \pK, a«>o = VZ/t/tt, L 2 -product: 

'[0,1] y/x{l-X) 



(4.22) (/, g) 

and orthonormality relations, 



(4.23) 



Ri(x)Rj(x)- 



dx 



[oa] J ' - x) 



4.1.1. High- dimensional domains and spectral approximations of functional spaces. The 
joint site frequency spectrum of K populations can be defined as a density on [0, 1]^. A 
natural basis of functions on the Hilbert space L^([0, 1]^), comes from the tensor prod- 
uct of one dimensional functions. More particularly, we consider the tensor product of 
Chebyshev polynomials 

(4.24) ^,i 2 ,-iK( x ) = R ii (x 1 )R i2 (x 2 ) ■■■ Ri K (x K ), 



18 



SERGIO LUKIC 1 , JODY HEY, AND KEVIN CHEN 



because L%([0, l] K ) = L 2 W ([Q, 1]) <g> • • • <x> L^([0, 1]). Therefore, any square integrable 
function F(x) under the L 2 -product 

(4.25) (F{x), G(x)) w = f F(x)G(x) f[ dXa 

•W] K a=l V - X a) 

can be approximated as multi-dimensional Chebyshev expansion 

Ai-lAa-l A K -1 

(4.26) F(x) = ^2 ^ ■■• ai l ,i. 2 ,... lK R ll {xi)R l2 {x 2 )---Ri K {x K ). 

i 1= i 2 =0 i K =0 

The truncation parameters Ai, A2 . . . can be fixed depending on the analytical properties 
of the set of functions that one wants to approximate and their corresponding truncation 
errors. There always exists a trade-off between the accuracy of the approximation (the 
higher the A the more accurate the approximation) and the speed of the implementation 
of the algorithm (the lower the A, the faster the algorithm); therefore, choosing different 
values of Aj will yield more optimal implementations of the algorithm. Here, for simplicity 
in the notation, we use a unique truncation parameter A = Ai = • ■ ■ = Ak- 

4.2. Diffusion Operators in Modal Variables. In order to approximate the PDEs defined 
in Eq. ( 13.71 ) by a system of ODEs in the modal Chebyshev variables such as Eq. (14. 7t , we 
need to project the Fokker-Planck operator in the Chebyshev basis spanned by Eq. ( 14.241 i. 
Later on we will show how to deal with the influx of mutations specified by the Dirac 
deltas. 

A direct projection of the Fokker-Planck operator onto the Chebyshev basis spanned 
by Eq. (14.241 ), would require storing the coefficients in a huge matrix with A 2K matrix 
elements. Fortunately, the Fokker-Planck operator in our problem is very simple, and its 
non-trivial information can be stored in just four sparse A x A matrices. First, we need the 
random drift matrix 

1 I" 1 d 2 dx 

(4.27) Dij = - / Ri(x)-^ (x(l - x)R j (x)) 

2 Jo dx y/ x (l - x) 

and then, the three matrices needed to reconstruct the migration term 

f 1 dx 

(4.28) G i:j = / Ri(x)xRj(x)- 

Jo 



(4.29) Hij = / Ri(x) 



'0 yxil - x) 

1 . . dRj (x) dx 



(4.30) Jij = / R t {x)x 



dx yj x {\ - x ) ' 
dRj (x) dx 



dx y/ x (l - x) 

The matrix elements in Eqs. ( 14.271 ). (I4.28l l. ( 14.291 ) and ( 14.301 ) can be quickly determined 
by means of the Gauss-Chebyshev quadrature defined in Eq. ( 14.141 ). Due to the properties 
of the Chebyshev polynomials many matrix elements vanish. More particularly, and 
Jij are upper triangular matrices (i.e., Dij = Jij = if i > j), Hij = if i > j, and 
Gij = if i > j + 1 or i < j — 1. Thus, the total number of non-trivial matrix elements 
that we need to compute, for a given A, is just |A 2 + ?A — 2. This is much smaller than 
the default number of matrix elements (i.e., A 2K ). 



Finally, the A K x A K matrix elements of the corresponding uj matrix in Eq. ( 14.7b can 
be easily recovered from the tensor product structure of the A A -dimensional vector space 
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that defines the Chebyshev expansion (as in Eq. (14.261 l). Thus, ^i 1 ...i P ,j 1 ...j K = 
( 431 )X! -^— D io,,j a - ^rn ab {H ia . Ja G lbA - S iaij J ibtjb - Ji a ,jji b , jb ) , 

a a.b 

with Sij = 1, if i = j, and Sij = if i ^ j. 

4.3. Influx of Mutations. The inhomogeneous terms in Eq. ( 13.71 ) that model the influx 
of mutations are given by effective mutation densities. As we show in the appendices, a 
model of mutations given by an exponential distribution will give the same results, up to an 
exponentially small deviation, as a standard model with a Dirac delta. The motivation for 
using smooth effective mutation densities is that they are better approximated by truncated 
Chebyshev expansions. As we briefly explained in the review on Chebyshev polynomials 
and its truncated expansions, the convergence of a truncated expansion depends strongly on 
the analytical properties of the function to be approximated. As Dirac deltas are not smooth 
functions, their truncated Chebyshev expansions present bad convergence properties. This 
is related to the problem of Gibbs phenomena, and we will give a more detailed account of 
its origin below (see Sources of error and limits of numerical methods). 




Derived allele frequency (*) Standard deviation ol the Gaussian function in) 



Figure 6. On the left, we show the plot of five different truncated 
Chebyshev expansions for a Gaussian peaked at x = 0.5 and a = 0.03. 
On the right, we show the truncation error of different Chebyshev ex- 
pansions (with A = 3, 6, 10 and 15) of a family of Gaussian func- 
tions peaked at x — 0.5 and parametrized by the standard deviation 
0.01 < a < 0.5. 

In this paper, we only consider a positive influx of mutations in boundary components 
of dimension one. In order to approximate the behavior under a mutation term given by a 
Dirac delta, an effective mutation density fi(x) has to satisfy the following: 

• The truncation error is bounded below the established parameter, e, for the control 
of error; i.e. , \\fJ,(x) — Va^{x)\\l 2 < e - 

• The expected frequency of the mass mutation-function is E /i (x) = j^-. 

• The mutation-function is nearly zero for relatively large frequencies (e.g., x > 
0.05), and it is as peaked as possible near x = 1/ (2N e ). 
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While the first and third qualitative requirements are straightforward, the second numerical 
condition is not. One can interpret this requirement as equivalent to fixing the neutral fixa- 
tion rate to be u, because the probability that an allele at frequency x — p reaches fixation 
at x = 1 is p. Thus, the average number of new mutants that reach fixation per generation 
is 2N e u x E M (x) = u. This constraint can also be derived by studying the properties of 
the equilibrium density associated with this stochastic process. At equilibrium, the density 
4> e (x) of derived alleles obeys 

(4.32) i^S-- 2 ^)' 

with ip(x) = x(l — x)<fi e (x). Therefore, the expected frequency of the mass mutation- 
function can be computed as 

r 1 , , if 1 d 2 ^ 

(4.33) ^ x»{x)dx = - — ^ x—dx. 
Using the identity x^ = x^ - %k one can rewrite Eq. ( 14.33b as 

(4.34) [ X ^ dX = -^f u fx^ = ^ 

On the other hand, the probability flux associated with the equilibrium density of alleles at 
the boundary x — 1 is j(l) = — ^-^'(l). We use this to write the expected frequency of 
the mutation density as: 



f 1 1 
(4.35) / xn (x )dx = ——j 1). 

Jo 2N e u 

In neutral evolution the probability flux at the boundary x = 1 equals the fixation rate, 
which satisfies = u. Therefore, Eq. ( 14.351 has to satisfy 

1 1 1 



(4.36) / xfi(x)dx = — — i(l) = 



o 



2N P u v ' 2N P u 2N e 



which is what we wanted to show. 

Numerical experiments show that for a large class of functions (j,(x), and in the fre- 
quency range x > x*, the associated solutions to the different diffusion problems are 
identical (up to a very small deviation) to the standard model with a Dirac delta, a;* is 
a very small frequency that depends on the choice of n{x), and generally can be made 
arbitrarily small. It is in the region of the frequency space with < x < x*, where the 
behavior of the different diffusion problems can deviate most. 

The truncation error in the Chebyshev expansion depends on the smoothness of the 
function, and the choice of truncation parameter (see Fig. [6] for an example). For the 
effective mutation density (i(x), we use the exponential function 

1 k 2 

(4.37) fi(x) = — x r exp(-nx), 

2N e 1 — cxp(— k) — k exp(— k) 

where the values for k(A, e) 3> 1 are determined by saturating the bound on error: || /i(x) — 
Vaij,(x)\\ L 2 < e. 
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4.3.1. Comparison of different mutation models at equilibrium. We derive in the Appen- 
dix A the associated equilibrium distributions of derived alleles. For a model with a muta- 
tion density given by a Dirac delta, one finds the equilibrium density 

4N e (2N e - l)ux - m 2 e u(x - l/(2N e ))9(x - l/(2N e )) 

(4.38) (j)e(x) = , 

x(l — X) 

with 8(y) the Heaviside step function (9(y) = for y < 0, 9(y) = 1/2 for y = 0, and 
6{y) = 1 for y > 0). Which in the region x > 1/(2^,,) simplifies to 

(4.39) M x) = ^H. 

x 

In the case of fJ,2(x) = ccxp(-Kx), the corresponding equilibrium density is 
(4.40) 

4N e u(l — x) + 8Af °" c (exp(— k)(1 + 1/k) — exp(— n)x — — cxp(— nx)) 
(Pe{x) = j- : ■ 

x(l — x) 



Figure 7 . Three comparisons of the equilibrium densities associated 
with the exponential mutation density (blue) for several values of k 
vs. the equilibrium density associated with the Dirac delta mutational 
model (red). For illustrative purposes, the population size used was 
N = 10, 000 and the spontaneous mutation rate is u = 10 -6 . On the left 
the equilibrium density associated with the exponential distribution with 
K = 10 is shown, in the middle k = 20, and on the right k = 40. For a 
truncation parameter A = 20, one can choose mutation densities with n 
up to 43, while keeping the truncation error below sensible limits. 

Therefore, a pairwise comparison of both equilibrium densities shows that the deviation 
from both models when x > x* = k^ 1 is exponentially small when equilibrium is reached 
(see Fig. [7}. We can show that the same is true in non-equilibrium. 

4.3.2. N on- equilibrium dynamics with effective mutation densities. Here, we show how 
the non-equilibrium dynamics of a diffusion system under an exponential distribution mu- 
tation influx is the same (up to an exponentially small deviation) as a system where muta- 
tions enter the population through the standard Dirac delta 5{x — 1/ (2N e )), as long as the 
allele frequencies are bigger than certain minimum frequency x*. Below x* the dynamics 
will be sensitive to differences in the mutation densities. 

Let (f(x) be an arbitrary initial density of alleles. Let <pi(x,t) be the solution to the 
diffusion equations under pure random drift and a mutation influx given by 6 (x— 1/ (2 N e )) . 
<p2(x,t) is the solution of the diffusion equations under pure random drift and mutation 
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influx given by the exponential effective mutation density Eq. (14.37b . In Appendix B, we 
prove the following identity in the large t limit 



(4.41) / \<jn(x,t)-<l>a{x,t)\x(l-x)dx = — — (l-exp(-t/(2JV e )))+0(exp(-«)), 



H 



with <f>i(x, 0) = (p2 (x, 0) = <p, and k < 2N e . If we normalize Eq. ( 14.4 U by 

lim / \<f>i(x,t)\x(l — x)dx, 

the normalized deviation of faix, t) from <j)\ (x, t) is, for large t, 

L \<t>i(x,t) — d>2(x,t)\x(l — x)dx 2, ,,,,,, , i> 

(4.42) , ' m ' V; V \ , , rr- = -(1 - exp(-t/(2iV e ))) + 0(e~ , N~ ). 

limt^oo Jo \9i(x,t)\x(l - x)dx 

We can also show, by applying the Minkowski inequality to Eq. dB.lQI i in Appendix B, 
that 

L \6i(x,t) — d>2(x,t)\x(l — x)dx 2, , ... . 

(4.43) rl| ; \ ' ; r^- < -(l + cxp(^/(2^)))+0( e ^,7V- 1 ), 

limt^oo Jo \9i{x,t)\x{l - x)dx 

for all i > 0. This means that the deviation is bounded by 0(k _1 ) for all t, and therefore 
the non-equilibrium dynamics of cf>i (x, t) and (j>2 {x, t) are identical in the large k limit. 

As \4>i(x,Q) — 02(x,O)| = at time zero and the deviation of cf>2(x,t) from <pi(x,t) 
attains equilibrium in the large t limit, we can study the frequency dependence of such 
deviation by looking at the equilibrium 

AN ue~ KX 

(4.44) lim Mx,t)~Mx,t) = —£ - + 0(e~ K ). 

t->oo X[l — X) 

Here, the 0(e~ K ) term exactly cancels the singularity at x = 1 and the deviation decays 
exponentially as a function of the frequency. This shows that for frequencies x > x* = 
> 1/ (2N e ) the dynamics of a model with mutation influx given by a Dirac delta is the 
same, up to an exponentially small deviation, as the non-equilibrium dynamics of a model 
with exponential mutation density. 

4.4. Branching-off of populations. Modeling a population splitting event also involves 
the use of Dirac deltas, as in Eq. (13.27b . or peaked functions such as Eq. d3.261 l. whose 
truncated Chebyshev expansions may present bad convergence properties. These Gibbs- 
like phenomena can be dealt with in a similar way as we did with the mutation term of Eq. 

<E3. 

We implemented two different solutions to this problem and both solutions yielded sim- 
ilar results. First, we constructed a smoothed approximation of the Dirac delta by using 
Gaussian functions: 

with 

(4.46) w(x a ) = [ cxp (- {Xa ~, XK + l)2 ) dxK+i, 



2a(x a y 
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Figure 8 . Diffusion under pure random drift acts by smoothing out 
the initial density at t = 0. Here we show numerical solutions to the 
diffusion equations with A = 28, at 5 different times, with initial condi- 
tion c/>(x, t = 0) = S(x — 0.3). As time passes, the numerical solution 
approaches the exact solution more quickly, and the Gibbs phenomena 
disappear. 



and <r(x a ) as a standard deviation which is chosen as small as possible while preserving 
the bound on error, \\S(x a — x) — V\6(x a — 2;)||l 2 [o<.t<i] < e, for any value of x a € [0,1]. 
In order to map the <5-function in Eq. ( 14.45b to a truncated Chebyshev expansion, 

A-1 A-1 

(4.47) T A 6(x 

) = AijRi(x a )Rj(xK+i), 

i=0 j=0 

one has to perform a Gauss-Chebyshev quadrature in 2 dimensions, < x a < 1, < 

xk+1 < 1: 

f 1 P 1 ~ dx dx 

(4.48) A. y =/ / 8{x a -x)R l {x a )R {x) a 

JO JO y/X a {l — X a ) y/X(l — X) 

The second approach exploits the analytical behavior of diffusion under pure random 
drift (i.e., with no migration). By Kimura's solution to the diffusion PDE in terms of the 
Gegenbauer polynomials {Gi(z)}, see lfP2l . we know that the time evolution of 1-d density 
is 

(4.49) T^{x,t) = J2 rS^(l - r 2 )C? ? (r)C? ! ;(z)exp(-( l + l)(i + 2)t/AN), 

with r = 1 — 2p, z = 1 — 2x and 0(x, 0) = 5(x — p). Thus, in the exact solution to 
the diffusion equation, the time evolution of the coefficients of degree i in the Gegenbauer 
expansion is described by the term exp(— (i + l)(i + 2)t/4N). This means that diffusion 
smooths out the Dirac delta at initial time. Fig. [8]represents the evolution of the density at 
different times. 
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Thus, we can use diffusion under pure random drift to smooth out the density introduced 
after the population splitting event. Let 4>k{x, t) be the density before the splitting and let 
a be the population from which population K + 1 is founded. We initially consider the 
density 



The associated Chebyshev expansion Va^k+i {%, xk+i , t) will present Gibbs-phenomena. 
However, by diffusing for a short time r under pure random drift 



(with S a = S K +i = W, S b = VfoxK + 1 ^ b ^ a, and V > W), (f>K+i(x,x K +i,r) 
becomes tractable under Chebyshev expansions. In other words, by choosing r such that 
the error bound is satisfied 



we obtain a smooth density after the population splitting event which can follow the reg- 
ular diffusion with migration prescribed in the problem, and approximate accurately the 
branching-off event. In some limits this approximation can fail, though we leave the cor- 
responding analysis for the next section. 

Here, we do not consider the numerical solution to the problem of splitting with admix- 
ture, although we are confident that it should be possible to solve along similar arguments. 

4.5. Sources of error and limits of numerical methods. There are two major sources 
of error in these numerical methods. First, the solution of the diffusion equation is itself 
a time-continuous approximation to the time evolution of a probability density evolving 
under a discrete Markov chain. Hence, whenever the diffusion approximation fails, its nu- 
merical implementation will also fail. Secondly, a numerical solution by means of spectral 
expansions involves an approximation of the infinite dimensional space of functions on a 
domain by a finite dimensional space generated by bases of orthonormal functions under 
certain L 2 -product. As we show below, under a broad set of conditions the numerical so- 
lution will converge accurately to the exact solution; otherwise, the numerical solution can 
fail to approximate the exact solution. A third source of error appears because one has to 
discretize time in order to integrate the high-dimensional ODE that approximates the PDE. 
Fortunately, this source of error can be ignored because the diffusion generators yield a 
stable time evolution. 

We summarize below the main conditions that have to be satisfied in order to obtain 
high-quality numerical solutions to the PDEs studied in this work. 

4.5. 1 . Limits of diffusion equations. In the diffusion approximation to a Markov chain, the 
transition probability is approximated by a Gaussian distribution J5) . Here, we review the 
derivation of the diffusion equation as the continuous limit of a Markov chain, in order to 
emphasize the assumptions made and determine the limits of this approximation. 

Given a Markov process defined by a discrete state space S, transition matrices p(I\J), 
initial value K € S and discrete time r = 0, 1, . . ., the probability that the state will be at 
I at time r is f(I\K, r), where f(I\K, r) obeys the recurrence relation 



(4.50) 



4>K+l(x,X K +l,t' = 0) = 4>K{x,t)S(x a - XK+l)- 



(4.51) 




\\<1>k+i(x,xk+i,t) -Va(I>k+i(x,xk+i,t)\\ L 2 < e, 



(4.52) 




Jes 
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In the diffusion approximation one considers a sequence of discrete state spaces {6>a}asz + 
such that in the limit A — > oo the state space Soc converges to a smooth manifold (in 
practical applications, a compact domain D C B, K ). 

In this paper, we take S\ to be [0, A] K , and Soo ~ [0, 1] K . Therefore, the state variables 
can be re-scaled as K a /\ — x a , with a = 1,...,K and K a £ [0, A]^. Similarly, we 
introduce the time variable t = r/A. In the large A limit, the transition probability for the 
change of the state from time r/A to time (r + 1)/A is governed by a distribution with 
moments 

(4.53) E(5x a \x) = M a (x)/X + 0(1/A 2 ), 

(4.54) E(5x a 5x b \x) ~ E(5x a \x)E(5x b \x) = C ab (x)/X + 0(1/A 2 ), 

(4.55) E(Sx 3 a \x) =0(1/X 2 ). 

In this continuous limit, the equation that describes the time evolution of the Markov chain 
in Eq. (14.52b can be written as a forward Kolmogorov equation if we neglect terms of order 
0(1/A 2 ). However, if M a (x) is proportional to A, the 0(1/A 2 ) terms in Eq. ( I4.53l l cannot 
be neglected and the diffusion approximation will not be valid. As in this paper we take 
A = 2N e , and M a (x) proportional to the migration rates m ab , if the migration rates obey 
2N ea m ab < 0(1) the diffusion approximation will be valid. Indeed, computer experi- 
ments show that the numerical solutions become unstable and yield incorrect results if this 
bound is violated. This limit precisely defines when two populations can be considered the 
same ||9)- Therefore, in cases when 2N e ^ a m ab ^ 0(1), we can consider populations a and 
b as two parts of the same population. Another assumption in the diffusion approximation 
is that a binomial distribution with 2N e ^ a degrees of freedom can be approximated by a 
Gaussian distribution. This will be a valid approximation as long as N eM is large enough. 
Numerical experiments show that the approximation is accurate if N eM > 100; otherwise, 
effects due to the finiteness of the Markov chain cannot be neglected and the approximation 
will fail. 



N eA > 100 
2N eia m ab < 0(1) 



4.5.2. Limits of spectral expansions. Spectral methods, as with any numerical scheme for 
solving PDEs, require several assumptions about the behavior of the solution of the PDE. 
The most important one is that one can approximate the solution as a series of smooth basis 
functions, 

A-l A-1 

(4.56) V A <t>{x,t) = ■ ■ ■ a iu i 2 ,... lp (t)T ll (x 1 )T. l2 (x 2 ) ■ ■ - T ip (x P ). 

ii—Q ip—0 

In other words, the projection of the solution 'Pa'X 2 ^ *) i s assumed to approximate cj>(x, t) 
well in some appropriate norm for sufficiently large A. As one has to choose finite values 
for A, Eq.( |4.56b will sometimes fail to approximate correctly the solution of the PDE. 

In the applications of this paper, the basis of functions that we use consist of the Cheby- 
shev polynomials of the first kincfl. Below we provide bound estimates for the truncation 



One can work either with the basis of functions {Ti(x)} on x G [—1, 1], or with the re-scaled basis 
{Ri(x)} defined onx 6 [0, 1], by performing a simple scale transformation. 
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error \\VA<f>(x, t) — <f>(x, f)||£3[_i iik, to understand the quality of the approximate solu- 
tions for different values of A, (see also |7]|T) for different choices of basis functions). 
More precisely, as the L 2 inner product and norm in the Chebyshev problem are: 



(4.57) 

and 

(4.58) 



(/, g)v>[-i,i\K = [ f{x)g{x) TT — ^= 

■A-i,i] K ti v 1 - 



i 2 [-i,i] 



i/(x)i 2 n^ 

i.iP t=i V 1 - 



r- 



the terms in the expansion Eq. ( |4.56t can be computed by performing inner products 
(4.59) 



o 



ll ,12, ...Iff 



(*) 



A" 



/ 0(x,/:)r n (a;i)T i2 (a;2) ■ • • T 4K (2;^) T 

•A- MP x _i 



3=1 C ij \/l 21 



with Co = 2 and Cj = 1 (j > 0). A consequence of the orthogonality of the basis functions 
is that the squared truncation error admits a simple formulation in terms of the coefficients 
in the expansion: 



(4.60) \\T A 0(x,t) - <b(x,t)\\ 2 L2[ _ 11]K 



K 

■■■ i a »,w,...«(*)i 5 



ii>A iif>A 



Thus, the truncation error depends only on the decay of the higher modes \ai lt i a ,,,,i K | in the 
expansion. On the other hand, the decay of these higher modes depends on the analytical 
properties of (j>(x, t) itself. For instance, if tp(x, t) G C^ 91-1,2 * -1 '-"' 29 * -1 ([-1, 1]*), 
i.e. if 



(4.61) 



d 
dxi 



2gi-l 



_d_ 

dx-. 



2?2-l 



dx K J 



2q K -l 



(j)(x,t) 



< 00, 



L 2 [-1,1] K 



we can integrate by parts Eq. ( 14.59b . as we did in Eq. ( 14.18b , to write the decay of each 
mode as 



(4.62) 



\a iui2 ,...i K {t)\ = (^j 



K 



[-1,1]* 

TiAxi) 



K 

n 

3=1 



1 — r 2 

> dx, 



2<ij 



(j)(x,t) 



-dx x ■ ■ 



T iK {x K ) 



C 1 qK */l - T 



A 



Eq. ( I4.62l i implies that the truncation error is directly related to the smoothness of 4>(x, t); 

it follows that we can bound the truncation error as a function of A: 

(4.63) 



||Pa0(M)-<KM)IU=[-i,i]* < C{q)A-^ 



- E, n 



A 

n 

3 = 1 



1 



0(a;,<) 



L 2 [-1,1] K 
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Another convenient measure of smoothness is the Sobolev norm: 

91 IK 

(4.64) «c[-i,i]* = E E 

s-i—O sp—0 

in terms of the Sobolev norm, the truncation error is bounded as 

(4.65) ||Pa0(*,O-#M)IU=[-i,i]* <CA-£j« H^i) || ajr[ _ lilF . 

A corollary of Eq. ( 14.65b is that if <p(x, t) is smooth, V\^)(x, t) converges to 4>(x, t) more 
rapidly than any finite power of A -1 . This is indeed the basic property that has given name 
to spectral methods. 

In the absence of influx of polymorphisms in the populations, the time evolution of the 
density obeys pure diffusion, and therefore |(*i 1) j 2! ...i K . (t)\ — > when t — > oo as it follows 
from Eq. ( 14.49) . This means that diffusion acts as a smoothing operator on the initial 
density. Empirically, we find that in the presence of influx of polymorphisms the density 
can also be approximated by spectral expansions and the truncation error remains low. 

After two populations split and the A' -dimensional diffusion becomes a K + 1 dimen- 
sional process, the K + 1 dimensional density becomes a distribution concentrated in the 
linear subspace of [— 1, defined by x a = x a+ \ (with a and a + 1 labeling the two 

daughter populations that just split). Such density has an infinite Sobolev norm and cannot 
be represented as a finite sum of polynomials. Fortunately, the diffusion generator acts on 
the density by smoothing it out and by bringing the density to a density with finite Sobolev 
norm. The main variables involved in this process are: the time difference between the 
current splitting event and the next one, Ta+i — Ta , and the effective population sizes 
N e a and 7V e a+ i of the daughter populations. Therefore, depending on the choice of the 
truncation parameter A, a minimum diffusion time t m (N e!a , N ea +i, A) will be necessary 
to bring the truncation error within desired limits HPa^x, i m ) — t m )\\L 2 < e - Here, e 
is the control parameter on numerical error. Therefore, the bigger the largest effective pop- 
ulation size of the two daughter populations, the bigger will be such minimum diffusion 
time. If the time difference between the current splitting event and the next one is bigger 
than 

(4.66) t m (N e , a ,N eia+1 , A) = C7(A)max(/Y eia ,iV e , a+1 ), 

(where C(A) is a function that can be computed numerically), the resulting numerical error 
will stay below the desired limits. As our model aims to reproduce the real SNP Allele 

\T A +i-T A \ > C(A)max(/V e 

\\VA7(x)-i(x)h*<e 

v ) 



nil 



$(x) 



L 2 l-1.1) K 



Frequency Spectrum density there should exist low error approximations of such density 
(that we denote as 7(2:)) in terms of polynomial expansions. Otherwise, the methods here 
presented will fail to solve the problem. This can only happen if 7(2;) is so rugged, i.e. 
the corresponding Sobolev norm is so high, that the largest finite choice for A that we can 
implement in our computer-code is not large enough to approximate accurately j(x): 

(4.67) WPA ma ^(x) -j(x)\\ L2 [_ 1A] K - CA„J?' 9j 1170)11^1 «k[_i,i]k > e- 



28 



SERGIO LUKIC 1 , JODY HEY, AND KEVIN CHEN 



In case that Eq. ( 14.671 ) is obeyed, it is likely that 2 or more populations are so closely 
related that we can treat them as if they were one identical population. If we reduce the 
dimensionality of the problem in this way (by only incorporating differentiated popula- 
tions), the correlations will disappear and the Sobolev norm of 7' (a;) will be such that we 
will be able to find a sensible parameter A to approximate j'(x) as a truncated Chebyshev 
expansion. 

5. Conclusion 

In this paper we have introduced a forward diffusion model of the joint allele frequency 
spectra, and a numerical method to solve the associated PDEs. Our approach is inspired by 
recent work in which similar models were proposed ETl l4l [§1 . Analogously, our methods 
are quite general and can accomodate selection coefficients and time dependent effective 
population sizes. 

The major novelties of the model here presented with respect to previous work are: 

• The introduction of spectral methods/finite elements in the context of forward dif- 
fusion equations and infinite sites models. Traditionally, these techniques yield 
better results than finite differences schemes when the dimension of the domain 
is high (i.e., when the final number of populations is high), and the solutions are 
smooth. A comparison of our implementation using spectral methods, and previ- 
ous implementations using finite differences (8), will be the matter of future work. 

• A set of boundary conditions that deals with the possibility that some polymor- 
phisms reach fixation in some populations while remaining polymorphic in other 
populations. When the differences in effective population sizes between different 
populations are large, this phenomenon can become very important. Here, we have 
introduced a solution to address this possible scenario. Previous work imposed 
zero flux at the boundaries [H, an d hence avoided the fixation of polymorphisms 
in some populations while remaining polymorphic in the rest. 

• The introduction of effective mutation densities, which generalize previous mod- 
els for the influx of mutations J4). We have emphasized how different ways to 
inject mutations at very low frequencies converge to the same solution for larger 
frequencies. 

The non-equilibrium theory of Allele Frequency Spectra is of primary importance to 
analyze population genomics data. Although it does not make use of information about 
haplotype structure or linkage non-equilibrium, the analysis of AFS allows the study of 
demographic history and the inference of natural selection. In this work, we have extended 
the diffusion theory of the multi-population AFS, to accommodate spectral methods, a 
general framework for the influx of mutations, and non-trivial boundary interactions. 
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Appendix A. Comparison of mutational models at equilibrium 

In this appendix we compute the equilibrium densities associated with Wright-Fisher 
processes with mutation. Two types of mutation processes are considered, both modeled 
by a mutation density. The first mutation density is a Dirac delta, while the second one is 
an exponential distribution. 

As the diffusion equation that describes the time evolution of the density of alleles for 
diallelic SNPs is 

(Al) = [x(l - x)4>(x,t)] + 2N eUf i(x), 

the equilibrium density <fi e (x) satisfies d ^Q^ = 0. By using instead the function ip(x) = 
x(l — x)(j) e (x), the associated second order ordinary differential equation becomes 



(A.2) __£^ + 2i\UM*) = 0. 



1 d 2 ip(x) 
4/Ve dx 2 

As Eq. (IA.2I ) is only defined for x > 0, we can use Laplace transforms to solve the 
equation. Let 

p oc 

(A.3) p,(s) = / n(x) exp(— sx)dx, 

Jo 

be the Laplace transform associated with the mutation density fi(x), and 
r°° d 2 iL'(x) 

(A.4) / *\ ' exp(-sx)dx = s 2 ip(s) - sip(0) - ^'(0), 

J Q dx z 

the Laplace transform associated with ijj"(x), with ip(0) and ip'(0) integration constants. 
Therefore, in the s domain, ip(s) is 

(A.5) j + WO) - *NMs) 

s 2 

and by performing the inverse Laplace transform we obtain the solution to the equilibrium 
density 

UU9 1 'to r- T »V(°) + ^(°)-^-AM 

X(l — X) ZTTt T->oo J e _ iT 

We fix the integration constants, ^(0) and ip'(0), by requiring <p e (x) to be finite at x = 1, 
and the probability flow at x = 1 to be equal to u, 

(A.7) = -^'(0) = u. 

As an example, we can evaluate exactly Eq. ( |A.6t , for fi\(x) = 5(x — l/(2N e )) and 
H2(x) = cexp(-Kx). For the Dirac delta, the Laplace transform is 

(A.8) /2i(s) = exp(-s/(2iVe)). 

If we compute the corresponding inverse Laplace transform in Eq. (IA.6I ), and fix the inte- 
gration constants as explained above, we find the equilibrium density 

4N e (2N e - l)ux - BNlu[x - l/(2N e ))0(x - l/(2JV e )) 
(A.9) Mx)- xJl^T) ' 
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with 6(y) the Heaviside step function (9(y) = for y < 0, 9(y) = 1/2 for y = 0, and 
9(y) = 1 for y > 0). If x > 1/(2N), Eq. (KM simplifies to 

(A.10) = 

x 

In the case of fJ,2(x) = ccxp(-Kx), the Laplace transform is 
(A.ll) fa(a) = -?—, 

S + K 

and the corresponding equilibrium density, after integrating Eq. JA.61 >. is 
(A.12) 

_ AN e u{\ -x) + (exp(-/c)(l + 1/k) - exp(-n)x - ± exp(-ra)) 

X[l — X) 

which in the large k limit, and for i» 1/k, converges exponentially quickly to 
(A.13) e (x) = 

x 

In the limit a; — >• 0, </> P f:r) is finite only iff c = x ? — 5 7 — T , which is 

' ' J 2iV e 1— exp( — — k exp( — k) ' 

the normalization choice made in Eq. (14.371 . and the only one satisfying 
(A.14) l! X ^=2W e - 

This shows how a mutation model defined by a certain class of smooth mutation densities 
reaches the same equilibrium density, up to a small deviation, as the standard model with 
a Dirac delta. 
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Appendix B. Comparison of mutational models at non-equilibrium 

In this appendix we compare the non-equilibrium dynamics of models with a mutation 
influx described by exponential distributions, with models that consider a standard Dirac 
delta. 

More particularly, we prove that if <pi(x,t) is the solution to an infinite sites model 
PDE, with absorbing boundaries, 

( B.l) dfrfof) = JL|1 [x(l - x)<h.(x,t)] + 2N e u8(x - l/(2N e )), 

and <j>2 {x, t) is the solution to the same model, but with an exponential mutation density 

rav\ d<j> 2 (x,t) Id 2 K 2 exp(-Kx) 
(B.2) = ——[ x (i- x )fa(x t t)]+u z 



dt 4N e dx 2 ' 1 — cxp(— re) — Kcxp(— re) ' 

then, the deviation of <f>2(x, t) with respect to <fii(x, t), as a function of time and for any 
initial condition 4>2{x, t = 0) = <f>i(x, t = 0) = <p(x), is, in the large t limit, 

f 1 AN v 

(B.3) / \4>i{x,t) - 4> 2 (x,t)\x(l - x)dx = — ^(l-exp(-V(2^V e )))+0(e- K ). 
Jo « 

Here, | • | is the absolute value, and 0(e~ K ) are terms that decay exponentially as a function 
of re, which can be neglected in the large re limit. 

As the total number of SNPs that are polymorphic in one population depends on the 
population size and the mutation rate, it is convenient to normalize the deviation Eq. ( IB. 3b 
by lim t _ i . 00 f Q \<f>i(x, t)\x(l — x)dx = (2N e — l)u. In this normalization we have 

fn \4>i{x,t) — d>2(x,i)\x(l — x)dx 2, , , 

(B.4) J ° Lr, \ M 6~ = -^-^M-t/(2N e ))) + 0(e-\N~ 1 ). 

limt^oo J \4>i{x,t)\x{l -x)dx « 

To prove Eq. ( IB. 3b . we first describe the solutions to Eq. ( IB. lb and Eq. ( IB. 2b . Both 
equations consist of a homogeneous term and an inhomogeneous contribution given by the 
mutation density. As they are linear equations, the solution to the PDE is the sum of a 
homogeneous and an inhomogeneous term 

(B.5) 1 (a:,t) = 0f(a:,t)+^(x) > 



satisfying 



d^(x,t) 1 d 2 r „ . lh , , n 



(B.6) t^C 1 " + ™ e u6(x - l/(2N e )). 

Hence, the only time-independent term <f>\ (x) that solves Eq. ( IB. 6b is the equilibrium 
density Eq. ( |A.9b , and ^(xjt) obeys a standard diffusion equation with no mutation 
density, and with initial condition 4>\{x, t = 0) = ip{x) — <f>f(x). If L<f>i{x, t) denotes the 
Fokker-Planck operator acting on <jy[ (x, t), 

C^{x,t) = ^^[x{l-x)c^{ X ,t)], 

we can write the solution to Eq. ( IB. 6b in the following compact form 
(B.7) </n{x,t) - cxp(i£) (<p(x) - (f>\[x)) + <Pl(x). 
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Here, exp (tC) is the time-dependent action of the diffusion operator on the initial density 
<p{x) — 0f (x) while preserving the absorbing boundary conditions. This operator can be 
diagonalized in the basis of Gegenbauer polynomials on £ 2 ([0, 1]); see (T2l . The corre- 
sponding eigenvalues of exp (tC) are exp(— (i + + 2)t/AN e ) with i e [0, oo). 
We can solve Eq. (IB.2l i in a similar way, by using the decomposition 



(B.8) 2 (x,t) = 2 ! (x,i) + 2 : (x). 

In this case, 0|(x) is the equilibrium density associated with the exponential mutation 
density, as defined in Eq. (IA.121 . The term §\ (x, t) evolves under pure random drift, with 
no mutation influx, and initial condition 02 (x, t = 0) = <fi(x) — 02 (x): 

(B.9) 2 (x,i) = exp (tC) (ip(x) - 4>l{x)) + 0f (x). 

By subtracting Eq. (IB.9b from Eq. JB.7t . we can describe the time evolution of the devia- 
tion as 

(B.10) 0x(x,t) - 2 (x,i) = -exp(i£) (0?(x) - 0f(x)) + 0?(x) - <%(x), 

which is independent of the initial condition <p(x). 

One can show that 0f (x) — $>\(x) is non negative on [0, 1], if k < 2N e . This can be 
seen more clearly by computing 4>\[x) — 4>2( x ) m me large k limit 

4N u 

(B.ll) #(x)-#(x) = — 5-(2JV e -K), x G [0, l/(2N e )), 

1 — x 

(B.12^(x) - 4>l(x) = m f e ~™ + 0(e-V(l - *)), x e (l/(2JV e ), 1]. 
x(l — x) 

The terms of order e _K in Eq. ( IB. 12b exactly cancel the divergence at x = 1. Therefore, 
the action of the diffusion operator on 4>\{x) — </>|(a;), will preserve the non-negativity of 
the density 

(B.13) ex${tC)(<t>l(x)-<t>%{x)) > 0, VxG[0,l], Vi > 0. 

Because of this inequality, the absolute value | exp (tC) (0f (x) — </)|(x))|, is the same as 

exp (tC) (0i (x) — 0| ( x ))< an d we can evaluate exactly the integral 

CB.14) 

| exp (tC) (4>i{x) ~ 4>l{x))\x{\ - x)dx = ( exp (tC) {<f>\ (x) - <t>l{x))x{\ - x)dx, 



by expanding exp (tC) (0f(x) — 01 ( x )) m tne eigenbasis of exp (tC). This basis is or- 
thogonal under the L 2 -product defined by the weight x(l — x), and the constant function 
on [0, 1] corresponds to the eigenfunction with the smallest eigenvalue. In this way we 
can interpret the right-hand side of Eq. ( IB. 14b as a projection on this eigenfunction, and 
evaluate the integral exactly. 

The eigenbasis of exp (tC) is defined by the Gegenbauer polynomials. As an example, 
the first three Gegenbauer polynomials on [0, 1], orthonormal under the i 2 -product with 
weight x(l — x), are 

(B.15) T Q (x) = V6, 

(B.16) T 1 (x)=y/m{l-2x), 
(B.17) T 2 (x) = V84(l-5x + 5x 2 ). 



NON-EQUILIBRIUM ALLELE FREQUENCY SPECTRA VIA SPECTRAL METHODS 
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The corresponding eigenvalues in exp (tC), are eigenvalues exp(— tj (2N e )), exp(— 3t/(2N e )), 
and exp(— 3t/N e ). Thus, Eq. ( IB. 14b is the same as 
(B 18) 

exp (tC) (0f(x)— <f)2{x))x(l~x)dx = [ exp (tC) (0f (x)— 0|(x)) ~^ ^ x(l—x)dx, 

Jo V6 

and 

(B.19) 

f 1 AN v 

exp(-t/(2N e )) / (tf(x) - <j>2(x))x(l - x)dx = — — exp(-t/(2N e )) + 0(e~ K ). 

Jo K 

As < exp (tC) (0f (x) - 0| (a:)) < 0? (x) - 0|(x) for all x E [0, 1] and for t > N e , we 

lastly compute Eq. (IB. 3b . as 

i ,i 

1 0i (a;, t) — <p2[x, t)\x(l — x)dx = / (0i(x) — <ftl(x))x(l — x)dx 

Jo 

(B.20) - / exp (tC) (0f (x) - <f> L 2 (xj)x(l - x)dx, 

Jo 

which is 

f 1 AN u 

(B.21) / \ ( f >1 ( x ,t)-<f> 2 {x 7 t)\x{l-x)dx = ^^(l-exp{-t/(2N e ))) + 0(e~ K ), 
Jo « 

as we wanted to show. 
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